Online Belief Propagation for Topic Modeling
نویسندگان
چکیده
Not only can online topic modeling algorithms extract topics from big data streams with constant memory requirements, but also can detect topic shifts as the data stream flows. Fast convergence speed is a desired property for batch learning topic models such as latent Dirichlet allocation (LDA), which can further facilitate developing fast online topic modeling algorithms for big data streams. In this paper, we present a novel and easy-to-implement fast belief propagation (FBP) algorithm to accelerate the convergence speed for batch learning LDA when the number of topics is large. FBP uses a dynamic scheduling scheme for asynchronous message passing, which passes only the most important subset of topic messages at each iteration for fast speed. From FBP, we derive an online belief propagation (OBP) algorithm that infers the topic distribution from the previously unseen documents incrementally by the online gradient descent. We show that OBP can converge to the local optimum of the LDA objective function within the online stochastic optimization framework. Extensive empirical studies demonstrate that OBP significantly reduces the learning time and achieves a much lower predictive perplexity when compared with that of several state-of-the-art online algorithms for LDA, including online variational Bayes (OVB) and online Gibbs sampling (OGS) algorithms.
منابع مشابه
Residual Belief Propagation for Topic Modeling
Fast convergence speed is a desired property for training latent Dirichlet allocation (LDA), especially in online and parallel topic modeling for massive data sets. This paper presents a novel residual belief propagation (RBP) algorithm to accelerate the convergence speed for training LDA. The proposed RBP uses an informed scheduling scheme for asynchronous message passing, which passes fast-co...
متن کاملTowards Big Topic Modeling
To solve the big topic modeling problem, we need to reduce both time and space complexities of batch latent Dirichlet allocation (LDA) algorithms. Although parallel LDA algorithms on the multi-processor architecture have low time and space complexities, their communication costs among processors often scale linearly with the vocabulary size and the number of topics, leading to a serious scalabi...
متن کاملCommunication-Efficient Parallel Belief Propagation for Latent Dirichlet Allocation
This paper presents a novel communication-efficient parallel belief propagation (CE-PBP) algorithm for training latent Dirichlet allocation (LDA). Based on the synchronous belief propagation (BP) algorithm, we first develop a parallel belief propagation (PBP) algorithm on the parallel architecture. Because the extensive communication delay often causes a low efficiency of parallel topic modelin...
متن کاملA topic modeling toolbox using belief propagation
Latent Dirichlet allocation (LDA) is an important hierarchical Bayesian model for probabilistic topic modeling, which attracts worldwide interests and touches on many important applications in text mining, computer vision and computational biology. This paper introduces a topic modeling toolbox (TMBP) based on the belief propagation (BP) algorithms. TMBP toolbox is implemented by MEX C++/Matlab...
متن کاملA New Approach to Speeding Up Topic Modeling
Latent Dirichlet allocation (LDA) is a widely-used probabilistic topic modeling paradigm, and recently finds many applications in computer vision and computational biology. In this paper, we propose a fast and accurate batch algorithm, active belief propagation (ABP), for training LDA. Usually batch LDA algorithms require repeated scanning of the entire corpus and searching the complete topic s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1210.2179 شماره
صفحات -
تاریخ انتشار 2012